GitHub - tavgar/test2222

This is a classic benchmarking challenge. You are absolutely right to be concerned about "noise" from the Operating System. If you use a standard base image (like ubuntu:latest), you will inherit hundreds of system packages that will clutter your ground truth.

To solve this, we will use a "Manifest-First" approach on a Minimal Base.

Here is the plan to build your Ground Truth Docker Image:

Select a Minimal Base: We will use Alpine Linux. It is tiny, so the "OS Noise" is minimal (manageable), but it still has a package manager (apk) that SCA tools can read.
Define Vulnerable Components: We will use Python for this example because it has a very clear dependency tree structure (pip). We will pick 7 direct components that trigger transitive dependencies and have known CVEs.
Generate the "Golden Record": We will run a command to export the exact list of installed packages (OS + Python) so you have your absolute truth.
Export to Tar: We will save it as a local artifact.

Phase 1: Define The Vulnerable Ground Truth

Create a file named requirements.txt. I have selected 7 direct dependencies. Because these are older versions, they will pull in about 10–15 transitive dependencies (dependencies of dependencies), and many contain critical vulnerabilities (CVEs) for your SCA tool to find.

requirements.txt

# 1. Django 2.2.24: Known SQL Injection vulnerabilities (CVE-2021-33203)
Django==2.2.24

# 2. PyYAML 5.3: Critical Deserialization vulnerability (CVE-2020-14343)
PyYAML==5.3

# 3. Requests 2.19.1: Old version, relies on vulnerable urllib3/idna (Transitive)
requests==2.19.1

# 4. Pillow 8.0.1: Contains buffer overflow vulnerabilities (CVE-2021-27921)
Pillow==8.0.1

# 5. Lxml 4.6.2: XML External Entity (XXE) vulnerabilities (CVE-2021-28957)
lxml==4.6.2

# 6. Flask 1.0.2: Old web framework, pulls in Werkzeug/Jinja2 (Transitive vulnerabilities)
Flask==1.0.2

# 7. Paramiko 2.7.1: SSH library, authentication bypass issues (Transitive deps like cryptography)
paramiko==2.7.1

Phase 2: The Dockerfile

We will build the image. We will also add a step that writes a "Bill of Materials" (BOM) to a file inside the image. This serves as your cheat sheet.

Dockerfile

# Use a specific hash of Alpine to ensure the OS layer never changes unexpectedly
# Python 3.9 on Alpine 3.14 (Old enough to have some OS vulnerabilities too)
FROM python:3.9-alpine3.14

WORKDIR /app

# COPY the vulnerable requirements
COPY requirements.txt .

# Install build dependencies (needed for Pillow/lxml), install python packages, 
# then delete build deps to keep the image small.
# Note: We keep the python packages installed.
RUN apk add --no-cache --virtual .build-deps gcc musl-dev libxml2-dev libxslt-dev jpeg-dev zlib-dev libffi-dev openssl-dev \
    && pip install --no-cache-dir -r requirements.txt \
    && apk del .build-deps

# --- GROUND TRUTH GENERATION ---
# This creates a file at /app/ground_truth_manifest.txt listing EVERYTHING.
# 1. List all OS level packages (APK)
# 2. List all Python Application packages (PIP)
RUN echo "=== OS PACKAGES (APK) ===" > ground_truth_manifest.txt \
    && apk info -vv >> ground_truth_manifest.txt \
    && echo -e "\n=== APP PACKAGES (PIP) ===" >> ground_truth_manifest.txt \
    && pip freeze >> ground_truth_manifest.txt

# Keep container running so you can inspect it if needed
CMD ["tail", "-f", "/dev/null"]

Phase 3: Build and Verify Ground Truth

Now we build the image and extract the answer key (the list of what is actually inside).

1. Build the Image

docker build -t sca-benchmark-v1 .

2. Extract your "Answer Key" You were worried about OS packages being included without your knowledge. This step pulls the manifest we created out of the image so you can see exactly what alpine included.

# Run a temporary container
docker run -d --name temp-bench sca-benchmark-v1

# Copy the manifest out to your local machine
docker cp temp-bench:/app/ground_truth_manifest.txt ./actual_ground_truth.txt

# Stop and remove the container
docker rm -f temp-bench

Open actual_ground_truth.txt.

The top section lists every OS library (glibc, ssl, etc.).
The bottom section lists your Python packages (Direct + Transitive).
This file is your Source of Truth for the benchmark.

Phase 4: Save as Local Tar File

SCA tools often accept a "saved" Docker image (a tarball) to simulate scanning a local artifact registry.

docker save -o sca-benchmark-v1.tar sca-benchmark-v1

You now have sca-benchmark-v1.tar. You can feed this file directly into tools like Trivy, Grype, Snyk, or JFrog Xray.

Summary of your Benchmark

Total Direct Components: 7 (from requirements.txt).
Total Transitive Components: ~15-20 (e.g., urllib3, certifi, chardet from Requests; Werkzeug, Jinja2 from Flask).
Vulnerabilities:
- Application Layer: High/Critical CVEs in Django, PyYAML, and Pillow.
- OS Layer: Since we used alpine3.14 (an older OS), there will likely be OS-level vulnerabilities (e.g., in busybox or ssl) that the tool should also find.

Would you like me to analyze the dependency tree for those specific Python packages right now, so you know exactly which transitive dependencies (children) to expect in your ground truth?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
c-unmanged.sh		c-unmanged.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phase 1: Define The Vulnerable Ground Truth

Phase 2: The Dockerfile

Phase 3: Build and Verify Ground Truth

Phase 4: Save as Local Tar File

Summary of your Benchmark

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Phase 1: Define The Vulnerable Ground Truth

Phase 2: The Dockerfile

Phase 3: Build and Verify Ground Truth

Phase 4: Save as Local Tar File

Summary of your Benchmark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages